NBA All-Star Success Analysis¶
Summary:¶
The National Basketball Association (NBA) annually celebrates the best players in the league with their All-Star game. Twenty-four players each year are selected for the game, where, starting in the 1975 season, the votes were split as 50% fan votes, 25% media votes, and 25% player votes.
These All-Stars are meant to represent the twenty-four best players in the game, but they are decided on opinion. This study is meant to check what statistics us as fans value in an All-Star, how these statistics relate to team success and if these All-Stars are crucial for team success.
Three Research Questions:¶
1: What are the prominent statistics of an All-Star player?¶
For the first question, the prominent statistics can be based on the following:
Are All-Stars across the years in the top 5% (~Top 25) of NBA players for their respective season in the specific statistic?
The percentages of All-Stars that fit the above criteria will be displayed high-to-low, and the top 6 statistics will be denoted as the “prominent” ones.
I chose to do 6 statistics because I feel it is enough to have a decent variety of potential statistics while also not being too much to where the data is messy.
After this study, it was found that, while the statistics change over the years, the 6 prominent statistics of All-Star players in 2025 are:
1: Free-Throw Attempts Per Game: The amount of times a player shoots free-throws each game.
2: Points Per Game: The amount of points scored per game played.
3: Field Goals Per Game: The number of shots made per game played.
4: Field Goal Attempts Per Game: The number of shots taken per game played.
5: Win Shares: Estimates how much a player contributes to their team winning, based on contributions offensively and defensively.
6: Two-Pointers Attempted Per Game: The number of two-pointers attempted per game by a player.
When looking at every season's prominent stats, the 6 most frequent statistics came out to be:
1: Win Shares: Estimates how much a player contributes to their team winning, based on contributions offensively and defensively.
2: Field Goals Per Game: The number of shots made per game played.
3: Points Per Game: The amount of points scored per game played.
4: Offensive Win Shares: Estimates how much a player contributes to their team winning, based on contributions offensively.
5: Free-Throws Per Game: # of free-throws a player makes per game by a player.
6: Tie between: PER (Player Efficiency Rating) and Field Goal Attempts Per Game
These six statistics denote the most important individual statistics for players that aspire to be All-Stars in the National Basketball Association. If a player say, next year, performs really well in all six of these stats, then it is clear that they are fit to be an All-Star for that season.
Is this surprising?¶
These answers are partially surprising, and they show a bias that NBA fans, media, and players have towards offensive production compared to defensive production. Each of the statistics shown in both the top 6 prominent stats for this season and all of the seasons combined are offensive-based, such as offensive win-shares, points per game, and field goals per game. Very rarely there are defensive statistics being in the top 6 prominent statistics for a season. This does make sense though, as its easy to say that a player that produces a lot of points is an elite player in the NBA, and deserves a spot in the All-Star game.
2: How do prominent statistics transfer to team success?¶
For the second question, we will look deeper into these six statistics, and look at the teams across the years that performed best/worst in these statistics as a unit, and how it relates to their success in these years. The way this will be done is looking season-by-season at the six prominent stats for each team that season. This will give us a ranking that we can compare to the ranking of success based on the following:
- Win count in the season (1 point per win)
- If a team made the playoffs (5 points)
- If a team won the championship that season (10 points)
In this question, I ranked the teams each season by their average placement in the 6 prominent stats compared to the other teams that season, and graphed the correlation between these rankings and the season scores that were calculated using the criteria above. In this graph, there was a clear positive correlation between the two. This means that, to answer this question...
Teams that are performing better in the six prominent statistics compared to the rest of the league generally find better success.¶
This means that, if you see a team performing really well in all of the prominent statistics in a season, you can get a sense of how well that team will do regarding win count, playoff berth, and championships.
Is this surprising?¶
This answer isn't too surprising, but it is important to note that statistics that are important for individual success also impact team success as well. When we relate this to the results found in question one, this could also potentially mean that offensive-heavy teams typically will find better success than defense-focused teams, as the prominent statistics that we found to be most common in Research Question #1 were offensive-based and relied on statistics that were produced when teams were on offense. Still, though, it makes sense that teams that are performing better in statistics that make players individually great find themselves in better positions in the season compared to teams that don't do as well in these statistics.
3: Do teams need All-Stars in order to find season success?¶
This third question is a check based on the first two questions. Once we have concluded the impact of an All-Star and how it relates to the success of the team, we can look at the success of all the teams compared to the number of All-Stars they had that season. Team success will be calculated the same as it was in Research Question #2. These statistics will be compared to how many All-Star players were on the team during the specific season, and compared with other teams throughout the years. This question is the most important to me because it is a debate I have had with plenty of people over the course of my lifetime, and I want to access the statistics to actually see if singular players are key for success or if it is actually based on the team aspect of basketball.
In this study, the answer I found is that, teams don't necessarily need an All-Star in order to perform well in a season, as shown by the 2013 Denver Nuggets, who performed better in their season compared to the 2011 Boston Celtics, who had 4 All-Stars. HOWEVER, an All-Star generally improves a team and makes them have a higher chance of finding success in a season. To go further, it was found that the best teams of all time in terms of success scores had 2 All-Stars on the team during that season, with all 3 of the top 3 Season Scores coming from teams that only had 2 All-Stars that season.
Is this surprising?¶
The first part of the answer, where teams don't need an All-Star, but definitely benifit from having one, isn't surprising to me. When a team has an All-Star, that means they have arguably one of the top 24 players on their team, which is bound to have an impact on their success as a squad. A single player can generate production offensively, defensively, and enough production to earn a team more wins, a playoff appearance, and sometimes even a championship. However, I was really surprised by the second part of the answer, where the best success scores come from teams with 2 All-Stars, rather than teams that have 3 or even 4. Perhaps it is just a fluke, but it could also be due to the fact that a team with a bunch of individually talented players may struggle to produce success at a team level. The prominent statistics are meant to represent why All-Stars on their own are better than the rest of the league, so if there are 4 players that are individually really great on the same team, there may be complications with playing together if they're all attempting to find success on their own. This leads to 2 All-Stars being the sweet spot for having enough individual talent while still being able to play as a team for a season.
Answer Summary¶
Overall, All-Stars are typically characterized by their elite performance in stats such as Win Shares, Field Goals Per Game, Points Per Game, Offensive Win Shares, Free-Throws Per Game, Player Efficiency Rating, and Field Goal Attempts Per Game. Not only that, but teams performing well in these six stats seem to perform better in a season compared to teams that do not perform as well. However, teams may not need an All-Star themselves in order to be successful. If they do have one, though, they generally seem to find more success than teams that are without an All-Star, with 2 All-Stars being the sweet spot for teams that want to be among the best teams ever.
Challenge Goals¶
Goal 1 - Statistical Hypothesis Testing:¶
Since we’re dealing with a lot of statistics that rely on each other, there are a few hypotheses that I can think of that can be made for each of the three research questions. Each of these are supported by my own knowledge of the NBA and my current understanding of what people value. After the study, I'll reflect on these hypothesis and describe how I was surprised/not surprised by the answers.
PRE-STUDY HYPOTHESIS¶
The first hypothesis is that the statistics that will likely show up as a “prominent statistic” include points per game, assists per game, and rebounds per game. These stats are the most widely-known stats by the general population of NBA fans as they are 1. The easiest to track and 2. The most displayed stats on televisions and on post-game highlights. In terms of the other statistics, it’s a little bit of a toss-up, but I predict that it may include box plus-minus (how well/bad your team does against the other team while you are on the court), true shooting percentage (which is a combination of all of the shooting percentages of 2-pointers, 3-pointers, and free-throws), and win-shares per 48 minutes (how much a player contributes to their team winning). These stats, while being a little complicated, are traditionally very high for All-Star level players, and low for players that aren’t. For all of these though, there are exceptions and there is the possibility for other stats to sneak in once all the data is compiled.
The second hypothesis, to relate to the second research question, is that stats such as win-shares, box plus-minus, and points will correlate to team success pretty easily. However, when it comes to assists, rebounds, and true-shooting percentage, it isn’t guaranteed that these will directly correlate to good teams. For example, a team could take a lot of individual shots per game, which would lower the total assist count for that team. They would still get a lot of points, but their assists as a team may be lower. These all rely on the assumption that my hypothesis for the first question is correct. If not, I do believe that, overall, prominent statistics will correlate to team success, as it would make sense that the stats that All-Star players are good at would connect to what is needed on a team level.
My third and final hypothesis regarding the third problem is that, while teams do not need an All-Star in order to find success, it generally helps and I predict that likely 90% of highly successful teams (Season Score of 60+) will have at least one All-Star, with maybe around 70% having two or more. In terms of wins, I think similar statistics will be found where teams with All-Stars will generally have more wins, with the number of wins increasing as more All-Stars appear on the team. This hypothesis doesn’t rely on the previous two questions, and there are no outside assumptions that are needed as we know what teams All-Stars were on each year and how teams did each year in terms of wins.
POST-STUDY HYPOTHESIS REFLECTION¶
Looking back on my first hypothesis, I was shocked to see that assists per game AND rebounds per game were never a top 6 prominent stat for any season from 1975-2025. Usually, when I would look at the rankings for points, rebounds, and assists per game, I would see familiar names on each of them. For example, in this past season, Nikola Jokic was a top 3 player in points, rebounds, and assists per game. Based on this knowledge that I had, I thought that all three of them would make an appearance as the most frequent. However, I am happy that I predicted that box plus-minus, win shares, and true shooting percentage would make an apperance in a top 6 prominent stat for atleast a single season. Win shares and box plus-minus were a big hit, as these stats made appearances in multiple seasons, while true shooting percentage only appeared once. However, I am happy with this first hypothesis and I think, overall, I was fairly accurate with my judgment that was based on my own knowledge of the NBA
Now, onto the second hypothesis. Since my first hypothesis was partly incorrect, this one is affected by that as well, since I based it partially off my prediction in the first hypothesis. With that being said, I did end up landing on the idea that prominent statistics in All-Stars would generally correlate to team success for those teams that perform well in these statistics on a team level. However, while stats like assist per game and rebounds per game were never looked at on a team level, I was correct in my hypothesis that some statistics wouldn't correlate to team success, even if they were prominent statistics. There were some statistics, such as age, minutes played, and defensive win shares that didn't correlate to team success, and the rankings of teams in these statistics would be very different compared to statistics that did correlate, such as win shares and points per game. This makes sense, as the amount of minutes a team plays in a season doesn't connect to team performance, it is only relevant based on if teams find themselves in overtime situations. Overall, though, this hypothesis wasn't bad, and I think I had good judgment based on my NBA knowledge for this question, and the statistics surprised me (in a good way!).
Lastly, my third hypothesis was pretty spot on, which makes sense to me as I feel as if I had the most knowledge about this question compared to the other two. As I said in my hypothesis, teams do not need an All-Star, they generally helps. My percentage predictions were nearly spot on, but there was only one team that had no All-Stars that had more than 60 season score, compared to the many, many more that had All-Stars on their squad. I did have an incorrect part of my prediction, though, as I claimed that, if a team has more All-Stars, they will get more wins. This isn't completely true, as my study found, the sweet spot for the number of All-Stars for a great team is 2. While having 3 or 4 raise the floor for team success, the cieling was highest for teams with two All-Stars. Overall, though, I think this research question was easiest to predict, as I am fairly familiar with how good teams were year-to-year since around the 2010s. With this, I know that most successful teams (ones that had high win counts, playoff appearances, and championships) had at least one All-Star on their team. However, I also knew there were some great teams that had no All-Stars whatsoever. It was great to see the stats confirm this, and that completes all three of my statistical hypothesis and I think this challenge goal was more than achieved in this project!
Goal 2 - New Library (Plotly):¶
For this project, I used Plotly in order to represent the data I found into interactive visualizations where specific seasons, satistics, and teams can be seen in order to better represent my results and in order to make the notebook look cleaner in its representation. For each research question, I included multiple plots and graphs (10 total!), which also included bar graphs, line graphs, and scatter plots. For questions #2 and #3, these plots were key in not only helping visualize the data and the processing of data happening, but actually judging correlation between statistics that would have been more difficult otherwise. For question #2, I plotted the correlation between my calculated season scores and the average ranking in prominent statistics. If I didn't have plotly or any other graphing library, I wouldn't have been able to represent these findings properly and it would have been a lot more difficult to get a propery answer for my research questions. In question #3, a similar situation happened where I plotted the season success scores based on the # of All-Stars on the team for that season. This also showed correlation between having All-Stars and finding season success. Again, without Plotly, I wouldn't have been able to answer this question and it would have been harder to visualize my results.
My favorite plots in this project are the two correlation graphs mentioned above and the line plot showing team rankings for each of the 6 prominent statistics. I feel like these graphs not only were visually appealing, but also really helpful in terms of showcasing my data. For the other graphs, I tried to use similar color palletes for my bar graphs related to statistical differences, such as my graphs about the # of All-Stars who passed the threshold for a prominent statistic, and the graph that compares team success scores across teams in a season. For all of my graphs, though, I tried it incorporate data that hasn't been shown off yet in order to make it so each part of my study has some sort of representation that can go with it.
One nice thing about this project is that I feel as if I have a really good understanding of how Plotly works after finishing this study. It was great to learn the tool and there were a lot of helpful resources (the ones cited below) that were great at teaching how every kind of graph worked with Plotly. I feel very comfortable with Plotly and I believe all of my graphs are key in helping readers visualize my data and my findings! With all of this being said, I believe I achieved this challenge goal and have learned a new library!
Collaboration and Conduct¶
Students are expected to follow Washington state law on the Student Conduct Code for the University of Washington. In this course, students must:
- Indicate on your submission any assistance received, including materials distributed in this course.
- Not receive, generate, or otherwise acquire any substantial portion or walkthrough to an assessment.
- Not aid, assist, attempt, or tolerate prohibited academic conduct in others.
Update the following code cell to include your name and list your sources. If you used any kind of computer technology to help prepare your assessment submission, include the queries and/or prompts. Submitted work that is not consistent with sources may be subject to the student conduct process.
your_name = "Joey Reitz"
sources = [
## DATASHEETS
# Used for the datasets in this project.
"https://www.kaggle.com/datasets/sumitrodatta/nba-aba-baa-stats?select=All-Star+Selections.csv",
# Specific datasheets used from this set:
# - Advanced.csv
# - All-Star Selections.csv
# - Player Per Game.csv
# - Team Stats Per Game.csv
# - Team Summaries.csv
# Used for the NBA Finals and MVP.csv datasheet:
"https://www.kaggle.com/datasets/thedevastator/historical-nba-finals-and-mvp-results",
## SPECIFIC LECTURE/SECTION NOTEBOOKS LOOKED AT & THEIR DATE
# Used for reading the csv data,
"data-frames.ipynb, Data Frames lecture - April 14th",
# Used for narrowing down columns, getting data from the dataframes
"groupby-and-indexing, Groupby and Indexing lecture - April 16th",
# Used for merging the two stat datasheets to create one complete datasheet to use
"dissolve-intersect-and-join, Dissolve, Intersect, and Join lecture - May 14th",
## PLOTLY SOURCES:
# Used to learn PlotLy to represent my data.
"https://plotly.com/python/",
# Also used for learning PlotLy
"https://www.geeksforgeeks.org/python-plotly-tutorial/#how-to-install-plotly",
# Used to figure out colors when plotting
"https://plotly.com/python/builtin-colorscales/",
# Used to figure out line plots in plotly
"https://plotly.com/python/line-charts/",
# Used to learn about adding traces to a graph_objects figure
"https://plotly.com/python/creating-and-updating-figures/",
## OUTSIDE SOURCES
# Used to learn .quantile, a DataFrame method that allows me to get the 95% percentile of
# statistics in my data
"https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html",
# Used to learn .isin(), a DataFrame method that allows me to narrow down rows in a dataset
# to only rows that also appear in a seperate dataset (used in prominent_stats_counter.
"https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html",
]
assert your_name != "", "your_name cannot be empty"
assert ... not in sources, "sources should not include the placeholder ellipsis"
assert len(sources) >= 6, "must include at least 6 sources, inclusive of lectures and sections"
Data Setting and Methods¶
Data Sheets used (and what they provide):¶
The datasheets used in this study were provided by Sumitro Datta's NBA Stats datasheet and this NBA Champion datasheet made by user "The Devastator".
Advanced.csv:¶
- Provides the advanced stats of every player in the NBA from the 1947-2025 seasons.
- Advanced stats are the stats that are calculated through algorithms and are a more advanced representation of basketball.
- Some spaces will be empty (NA) due to some stats not being recorded until later seasons.
All-Star Selections.csv:¶
- Provides the names of all of the All-Stars per season since 1951.
Player Per Game.csv:¶
- Provides the basic stats of every player in the NBA from the 1947-2025 seasons.
- Basic stats are the stats that are typically shown in box scores and are based on per-game statistics.
- Some spaces will be empty (NA) due to some stats not being recorded until later seasons.
Team Stats Per Game.csv:¶
- Provides the basic stats of every player in the NBA from the 1947-2025 seasons.
- Basic stats are the stats that are typically shown in box scores and are based on per-game statistics.
- Some spaces will be empty (NA) due to some stats not being recorded until later seasons.
Team Summaries.csv:¶
- Provides a few of the advanced statistics for NBA teams that are able to be tracked by this study.
- Provides some advanced team stats that are connected to individual statistics of players.
NBA Finals and MVP.csv:¶
- Provides all of the NBA Champions from the 1950-2024 seasons and the year they won (2025 NBA champion undecided as of 06/09/2025).
Changes to each Data Sheet (and why):¶
FOR ALL:¶
- Narrow the league down to the "NBA" only, so no ABA seasons are included.
- Narrow the league down to only the seasons of 1975-2024, as the 1974-75 season was the first season where fans voted for the All-Stars.
Advanced.csv:¶
- Removed the columns representing the birth year and the position of players as they are both irrelevant to what is being researched.
- Removed the VORP stat, as there is no good team-equivalent.
- Irrelevant due to this project not using birth years and treating the study as a positionless study.
All-Star Selections.csv:¶
- Removed the All-Star team column and the column representing if a player was replaced as they are both irrelevant to what is being researched.
- Irrelevant due to being unrelated to any stats and having no affect on the games.
Player Per Game.csv:¶
- Removed the columns for birth year and position of players for same reasons as Advanced.csv.
Team Stats Per Game.csv:¶
- Removed the games played for each team, as it won't be necessary for calculating team success.
Team Summaries.csv:¶
- Removed stats regarding opponent scores, removed arena details, and removed some general stats that aren't applicable to the study.
NBA Finals and MVP.csv:¶
- Only kept the year and name of the team that won the championship for the given year as those are the only pieces of information given that will be useful for this study.
After this cleaning, I will merge the Player Per Game.csv dataset with the Advanced.csv dataset to get a single datasheet for player statistics, and do the same for Team Stats Per Game.csv and Team Summaries.csv.
Code: Initializing and cleaning the individual data sheets.¶
# First, install dependencies
!pip install plotly
# Next, import all packages used
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from pandas.testing import assert_series_equal
Requirement already satisfied: plotly in /opt/conda/lib/python3.11/site-packages (6.1.2) Requirement already satisfied: narwhals>=1.15.1 in /opt/conda/lib/python3.11/site-packages (from plotly) (1.42.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.11/site-packages (from plotly) (24.2) [notice] A new release of pip is available: 25.0.1 -> 25.1.1 [notice] To update, run: pip install --upgrade pip
# Reading csvs
advanced = pd.read_csv("Advanced.csv")
all_stars = pd.read_csv("All-Star Selections.csv")
nba_champions = pd.read_csv("NBA Finals and MVP.csv")
player_per_game = pd.read_csv("Player Per Game.csv")
team_per_game = pd.read_csv("Team Stats Per Game.csv")
team_advanced = pd.read_csv("Team Summaries.csv")
# Cleaning NBA Finals and MVP.csv *HANDLED SEPERATE DUE TO DIFFERENT LABELS IN DATASHEET
nba_champions = nba_champions[["Year", "NBA Champion"]]
nba_champions = nba_champions[nba_champions["Year"] >= 1975]
# Cleaning Advanced.csv
advanced = advanced[(advanced["season"] >= 1975) & (advanced["lg"] == "NBA")]
advanced = advanced.drop(["birth_year", "pos", "lg", "vorp"], axis=1)
# Cleaning All-Star Selections.csv
all_stars = all_stars[(all_stars["season"] >= 1975) & (all_stars["lg"] == "NBA")]
all_stars = all_stars.drop(["team", "replaced", "lg"], axis=1)
# Cleaning Player Per Game.csv
player_per_game = player_per_game[(player_per_game["season"] >= 1975) &
(player_per_game["lg"] == "NBA")]
player_per_game = player_per_game.drop(["birth_year", "pos", "lg"], axis=1)
# Cleaning Team Stats Per Game.csv
team_per_game = team_per_game[(team_per_game["season"] >= 1975) &
(team_per_game["lg"] == "NBA")]
team_per_game = team_per_game.drop(["lg","g"], axis=1)
# Cleaning Team Summaries.csv
team_advanced = team_advanced[(team_advanced["season"] >= 1975) &
(team_advanced["lg"] == "NBA")]
team_advanced = team_advanced[["season", "team", "abbreviation", "age", "w", "l", "o_rtg",
"d_rtg", "n_rtg", "pace", "f_tr", "x3p_ar", "ts_percent",
"tov_percent", "orb_percent", "e_fg_percent"]]
# Merging player_per_game and advanced_stats for one single stats sheet
columns = ["seas_id", "season", "player_id", "player", "age", "experience", "tm", "g"]
player_stats = player_per_game.merge(advanced, on=columns)
columns = ["season", "team", "abbreviation"]
team_stats = team_per_game.merge(team_advanced, on=columns)
Each Stat being recorded and their meaning:¶
In total, we are looking at 52 different statistics, this is a lot and some have confusing names, so let's go through each of them and what they represent!
Basic statistics (from Player Per Game.csv)¶
- age: The age of the player, in years.
- experience: The number of years a player has been playing in the NBA.
- g: Games played in a season
- gs: Games started in a season, meaning games where players were on the court at the beginning of a game.
- mp_per_game: Minutes played per game, the # of minutes someone is playing on the court on average for the season.
- fg_per_game: Field goals per game, the # of shots a player makes in a game, not counting free-throws.
- fga_per_game: Field goal attempts per game, the # of shots a player takes in a game, not counting free-throws.
- fg_percent: Field goal percent, the % of shots made versus the attempts per game.
- x3p_per_game: # of three-pointers made per game by a player.
- x3pa_per_game: # of three-pointers attempted per game by a player.
- x3p_percent: Percentage of three-pointers made per game.
- x2p_per_game: # of two-pointers made per game by a player.
- x2pa_per_game: # of two-pointers attempted per game by a player.
- x2p_percent: Percentage of two-pointers made per game.
- e_fg_percent: Effective field-goal percentage, field goal percentage that accounts for three-pointers being more valuable.
- ft_per_game: # of free-throws a player makes per game by a player.
- fta_per_game: #r of free-throws attempted per game by a player.
- ft_percent: Percentage of free-throws made per game.
- orb_per_game: Offensive rebounds per game by a player.
- drb_per_game: Defensive rebounds per game by a player.
- trb_per_game: Total rebounds per game by a player.
- ast_per_game: Assists per game by a player.
- stl_per_game: Steals per game by a player.
- blk_per_game: Blocks per game by a player.
- tov_per_game: Turnovers per game by a player.
- pf_per_game: Personal fouls per game charged on a player.
- pts_per_game: Points scored per game by a player.
Advanced Statistics (from Advanced.csv)¶
- mp: Minutes played total in a season.
- per: Player Efficiency Rating, describes the positive/negative impact a player has compared to their usage.
- ts_percent: True Shooting Percentage, a calculation of shooting percentage that weights the different types of shots.
- x3p_ar: The rate of which a player shoots a three-pointer compared to any other shot.
- f_tr: The rate of which a player shoots a free-throw compared to any other shot.
- orb_percent: The percentage of offensive rebounds grabbed compared to the # available per game.
- drb_percent: The percentage of defensive rebounds grabbed compared to the # available per game.
- trb_percent: The percentage of total rebounds grabbed compared to the # available per game.
- ast_percent: The percentage of teammate's field goals a player assisted per game.
- stl_percent: The percentage of steals a player gets compared to their team total per game.
- blk_percent: The percentage of blocks a player gets compared to their team total per game.
- tov_percent: The percentage of turnovers a player gets compared to their team total per game.
- usg_percent: Usage percentage, describes how often a player is used each possession they play in (if they touch the ball)
- ows: Estimates how much a player contributes to their team winning, based on contributions offensively
- dws: Estimates how much a player contributes to their team winning, based on contributions defensively
- ws: Estimates how much a player contributes to their team winning, based on contributions offensively and defensively
- ws_48: Weights the win-shares to be calculated based on if a player were to play 48 minutes (total minutes in regulation)
- obpm: Offensive Box Plus-Minus, calculates how much a team scores when a player is on the court.
- dbpm: Defensive Box Plus-Minus, calculates how much a team gets scored on when a player is on the court.
- bpm: Box Plus-Minus, calculates how much a team gains/loses a lead when a player is on the court.
Team-only Statistics (From Team Summaries.csv and Team Stats Per Game.csv)¶
- o_rtg: Offensive Rating: How good a team is offensively (Higher the better).
- d_rtg: Defensive Rating: How good a team is offensively (Lower the better).
- n_rtg: Net Raing: Offensive rating - defensive rating, how good a team is (Higher the better).
- pace: Average amount of possessions a team has in a typical 48 minute game.
- f_tr: Free-throw rate: The frequency in which a team shoots free-throws on a possession that ends in a shot.
Testing Data¶
In order to make sure the program is working properly, I made this basic stat sheet using rows from the large datasheet in order to test for effectiveness. I included players that played in 2024 and/or 2023, which are the seasons we will be using for the tests. I also picked six random teams that had different performances in the 2023 and 2024 seasons, which will be used for testing for the later research questions that involve team statistics.
# Random Assortment of players, including some All-Stars
test_data = player_stats[(player_stats["player"] == "Anthony Edwards") |
(player_stats["player"] == "Victor Wembanyama") |
(player_stats["player"] == "Ja Morant") |
(player_stats["player"] == "LaMelo Ball") |
(player_stats["player"] == "Scottie Barnes") |
(player_stats["player"] == "Herbert Jones") |
(player_stats["player"] == "Paolo Banchero") |
(player_stats["player"] == "Kobe Bufkin")]
# Random Assortment of teams
team_test_data = team_stats[(team_stats["team"] == "Boston Celtics") |
(team_stats["team"] == "Detroit Pistons") |
(team_stats["team"] == "Miami Heat") |
(team_stats["team"] == "Minnesota Timberwolves") |
(team_stats["team"] == "San Antonio Spurs") |
(team_stats["team"] == "Washington Wizards")]
# Narrow down to just the seasons being used
test_data = test_data[(test_data["season"] == 2024) |
(test_data["season"] == 2023)]
team_test_data = team_test_data[(team_test_data["season"] == 2024) |
(team_test_data["season"] == 2023)]
# Player Test Data
test_data
| seas_id | season | player_id | player | age | experience | tm | g | gs | mp_per_game | ... | blk_percent | tov_percent | usg_percent | ows | dws | ws | ws_48 | obpm | dbpm | bpm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 770 | 31171.0 | 2024 | 4808.0 | Anthony Edwards | 22.0 | 4 | MIN | 79 | 78.0 | 35.1 | ... | 1.3 | 11.9 | 32.3 | 2.9 | 4.7 | 7.5 | 0.130 | 2.7 | 0.5 | 3.3 |
| 984 | 31385.0 | 2024 | 4939.0 | Herbert Jones | 25.0 | 3 | NOP | 76 | 76.0 | 30.5 | ... | 2.6 | 12.4 | 14.1 | 3.3 | 3.0 | 6.3 | 0.131 | -0.9 | 1.5 | 0.6 |
| 1007 | 31408.0 | 2024 | 4723.0 | Ja Morant | 24.0 | 5 | MEM | 9 | 9.0 | 35.3 | ... | 1.5 | 12.0 | 30.4 | 0.5 | 0.3 | 0.8 | 0.124 | 2.7 | 0.4 | 3.1 |
| 1166 | 31567.0 | 2024 | 5172.0 | Kobe Bufkin | 20.0 | 1 | ATL | 17 | 0.0 | 11.5 | ... | 2.3 | 9.5 | 22.4 | -0.3 | 0.1 | -0.1 | -0.031 | -4.3 | -0.2 | -4.5 |
| 1177 | 31578.0 | 2024 | 4858.0 | LaMelo Ball | 22.0 | 4 | CHO | 22 | 22.0 | 32.3 | ... | 0.5 | 15.0 | 34.3 | 0.6 | 0.5 | 1.1 | 0.074 | 3.8 | -0.5 | 3.3 |
| 1299 | 31700.0 | 2024 | 5089.0 | Paolo Banchero | 21.0 | 2 | ORL | 80 | 80.0 | 35.0 | ... | 1.6 | 13.0 | 29.7 | 1.3 | 4.0 | 5.3 | 0.090 | 1.3 | 0.0 | 1.3 |
| 1361 | 31762.0 | 2024 | 5006.0 | Scottie Barnes | 22.0 | 3 | TOR | 60 | 60.0 | 34.9 | ... | 3.7 | 13.6 | 24.8 | 2.3 | 2.0 | 4.3 | 0.098 | 2.9 | 0.8 | 3.7 |
| 1449 | 31850.0 | 2024 | 5209.0 | Victor Wembanyama | 20.0 | 1 | SAS | 71 | 71.0 | 29.7 | ... | 10.0 | 16.2 | 32.2 | -0.7 | 4.4 | 3.7 | 0.085 | 1.9 | 3.3 | 5.2 |
| 1495 | 30483.0 | 2023 | 4808.0 | Anthony Edwards | 21.0 | 3 | MIN | 79 | 79.0 | 36.0 | ... | 1.8 | 13.0 | 29.9 | 0.2 | 3.6 | 3.8 | 0.064 | 1.0 | 0.0 | 1.0 |
| 1692 | 30680.0 | 2023 | 4939.0 | Herbert Jones | 24.0 | 2 | NOP | 66 | 66.0 | 29.6 | ... | 2.1 | 13.3 | 14.5 | 1.5 | 2.7 | 4.2 | 0.104 | -2.0 | 1.7 | -0.3 |
| 1708 | 30696.0 | 2023 | 4723.0 | Ja Morant | 23.0 | 4 | MEM | 61 | 59.0 | 31.9 | ... | 0.7 | 12.6 | 34.9 | 3.4 | 2.6 | 6.0 | 0.148 | 5.2 | 0.5 | 5.7 |
| 1879 | 30867.0 | 2023 | 4858.0 | LaMelo Ball | 21.0 | 3 | CHO | 36 | 36.0 | 35.2 | ... | 0.8 | 14.3 | 30.0 | 0.6 | 1.2 | 1.8 | 0.068 | 3.2 | -0.8 | 2.4 |
| 1994 | 30982.0 | 2023 | 5089.0 | Paolo Banchero | 20.0 | 1 | ORL | 72 | 72.0 | 33.8 | ... | 1.5 | 12.8 | 27.5 | -0.3 | 2.6 | 2.4 | 0.047 | -0.7 | -0.7 | -1.5 |
| 2053 | 31041.0 | 2023 | 5006.0 | Scottie Barnes | 21.0 | 2 | TOR | 77 | 76.0 | 34.8 | ... | 2.2 | 12.0 | 20.3 | 2.3 | 2.7 | 5.0 | 0.090 | 0.5 | -0.1 | 0.4 |
14 rows × 52 columns
# Team Test Data
team_test_data
| season | team | abbreviation | playoffs | mp_per_game | fg_per_game | fga_per_game | fg_percent | x3p_per_game | x3pa_per_game | ... | o_rtg | d_rtg | n_rtg | pace | f_tr | x3p_ar | ts_percent | tov_percent | orb_percent | e_fg_percent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 32 | 2024 | Boston Celtics | BOS | True | 241.8 | 43.9 | 90.2 | 0.487 | 16.5 | 42.5 | ... | 123.2 | 111.6 | 11.6 | 97.2 | 0.224 | 0.471 | 0.609 | 10.8 | 24.9 | 0.578 |
| 39 | 2024 | Detroit Pistons | DET | False | 240.9 | 40.9 | 88.2 | 0.463 | 11.0 | 31.7 | ... | 109.7 | 118.8 | -9.1 | 99.8 | 0.246 | 0.360 | 0.562 | 13.5 | 23.9 | 0.526 |
| 46 | 2024 | Miami Heat | MIA | True | 240.9 | 39.8 | 85.6 | 0.465 | 12.5 | 33.7 | ... | 114.0 | 112.2 | 1.8 | 96.2 | 0.257 | 0.394 | 0.578 | 11.7 | 21.8 | 0.538 |
| 48 | 2024 | Minnesota Timberwolves | MIN | True | 241.5 | 41.3 | 85.0 | 0.485 | 12.6 | 32.7 | ... | 115.6 | 109.0 | 6.6 | 97.1 | 0.270 | 0.384 | 0.594 | 13.0 | 23.2 | 0.559 |
| 57 | 2024 | San Antonio Spurs | SAS | False | 241.8 | 41.9 | 90.7 | 0.462 | 12.6 | 36.4 | ... | 110.0 | 116.4 | -6.4 | 101.1 | 0.220 | 0.401 | 0.563 | 13.2 | 22.9 | 0.532 |
| 60 | 2024 | Washington Wizards | WAS | False | 240.6 | 43.0 | 91.4 | 0.470 | 12.4 | 35.5 | ... | 110.5 | 119.6 | -9.1 | 102.7 | 0.221 | 0.389 | 0.567 | 12.2 | 20.0 | 0.538 |
| 63 | 2023 | Boston Celtics | BOS | False | 243.7 | 42.2 | 88.8 | 0.475 | 16.0 | 42.6 | ... | 118.0 | 111.5 | 6.5 | 98.5 | 0.243 | 0.480 | 0.600 | 12.0 | 22.1 | 0.566 |
| 70 | 2023 | Detroit Pistons | DET | False | 241.5 | 39.6 | 87.1 | 0.454 | 11.4 | 32.4 | ... | 110.7 | 118.9 | -8.2 | 99.0 | 0.295 | 0.372 | 0.561 | 13.3 | 24.9 | 0.520 |
| 77 | 2023 | Miami Heat | MIA | False | 241.5 | 39.2 | 85.3 | 0.460 | 12.0 | 34.8 | ... | 113.0 | 113.3 | -0.3 | 96.3 | 0.270 | 0.408 | 0.574 | 12.4 | 22.8 | 0.530 |
| 79 | 2023 | Minnesota Timberwolves | MIN | False | 241.8 | 42.9 | 87.4 | 0.490 | 12.2 | 33.3 | ... | 113.7 | 113.8 | -0.1 | 101.0 | 0.271 | 0.381 | 0.592 | 13.6 | 21.5 | 0.560 |
| 88 | 2023 | San Antonio Spurs | SAS | False | 242.1 | 43.1 | 92.6 | 0.465 | 11.1 | 32.2 | ... | 110.2 | 120.0 | -9.8 | 101.6 | 0.229 | 0.348 | 0.554 | 13.0 | 25.6 | 0.525 |
| 91 | 2023 | Washington Wizards | WAS | False | 240.9 | 42.1 | 86.9 | 0.485 | 11.3 | 31.7 | ... | 114.4 | 115.6 | -1.2 | 98.5 | 0.258 | 0.365 | 0.585 | 12.7 | 22.6 | 0.550 |
12 rows × 39 columns
Now that we have all the stats that we're working with, lets get to the results!¶
Results¶
RESEARCH QUESTION #1: What are the prominent statistics of All-Stars?¶
When looking at All-Stars, they are meant to represent players that are above and beyond all of their peers. However, there is no complete way to quantify what it means to be above and beyond others. This is what I am trying to answer, as I am looking at how certain statistics seen in players compare to that of peers, and which stats set players apart if they are high in that stat.
Part 1: Beginning Steps - Calculate Thresholds¶
We have our stat sheets, but we currently have no way to make meaning of the data. To start, if we want to figure out which All-Stars are in the top 95% of players for a certain statistic, we have to figure out the marker, or threshold, that a player needs to reach in a certain statistic to be considered in the top 95%. This can be done with:
get_stat_thresholds:¶
This program returns a series that contains the minimum number needed for a stat to be considered in the top X% of players in the NBA. For our purposes, we will use the top 95%, or .95 of players, but this program could theoretically look at any percent and get the thresholds for all of the statistics for that percentile.
def get_stat_thresholds(data, season, threshold):
"""
Given an NBA statistical dataset, a season, and a threshold, returns a series of the number
needed for any player to surpass that threshold in a season (be in that threshold percentile).
"""
curr_season = data[data["season"] == season]
curr_season = curr_season.drop(["seas_id", "season", "player_id", "player", "tm"], axis=1)
return curr_season.quantile(threshold)
# TEST PORTION - Changing thresholds with each to ensure correctness
assert get_stat_thresholds(test_data, 2024, 0.0)["g"] == 9.0, "Incorrect Threshold"
assert get_stat_thresholds(test_data, 2024, 1.0)["ws"] == 7.5, "Incorrect Thresholds"
assert get_stat_thresholds(test_data, 2024, 0.5)["bpm"] == 3.2, "Incorrect Threshold"
assert get_stat_thresholds(test_data, 2023, 0.25)["mp"] == 1948.75, "Incorrect Threshold"
Great! This gives us the statistical thresholds for any nth percentile, lets visualize this with a graph!¶
Using Plotly, lets graph the changes in what the threshold for being in the top 95% of points, rebounds, and assists is through the 1975-2025 seasons! We will also be establishing our DataFrame that gives us the thresholds for each season, which we will use for the next part of this question!
season_thresholds = pd.DataFrame()
for season in range(2025, 1974, -1):
season_thresholds[season] = get_stat_thresholds(player_stats, season, .95)
fig = go.Figure()
for stat in ["pts_per_game", "trb_per_game", "ast_per_game"]:
fig.add_trace(go.Scatter(
x=season_thresholds.columns,
y=season_thresholds.loc[stat].values,
mode="lines",
name=stat[0:3]
))
fig.update_layout(title="95th Percentile in Points, Rebounds, and Assists throughout the seasons",
xaxis_title="Season", yaxis_title="Stat Count")
fig.show(renderer="notebook")
As shown above, the 95th percentile in points is quite inflated compared to rebounds and assists, with assists having the lowest value required to be in the top 95%!
Now, what else can we do with this data?¶
For the upcoming methods, lets establish out testing set. For this, we will just use the 2023 and 2024 seasons and use the top 50% of players (since there are little players in the testing set).
test_thresholds = pd.DataFrame()
test_thresholds[2023] = get_stat_thresholds(test_data, 2023, .5)
test_thresholds[2024] = get_stat_thresholds(test_data, 2024, .5)
To get a representation of how different the top 95% of players are in statistics compared to league average (50%), we can also use a bar graph and check the differences based on the 2025 season.
With our method, we can compare the league average with the top 95% using bar graphs for 2025!
Let's compare points, assists, and rebounds per game!
average_thresholds = get_stat_thresholds(player_stats, 2025, 0.5)
fig = go.Figure(data=[
go.Bar(name='Average', x=["Points", "Assists", "Rebounds"],
y=average_thresholds[["pts_per_game", "ast_per_game", "trb_per_game"]]),
go.Bar(name='Top 95%', x=["Points", "Assists", "Rebounds"],
y=season_thresholds.loc[["pts_per_game", "ast_per_game", "trb_per_game"], 2025])
])
fig.update_layout(barmode='group', yaxis_title="# Of Stat", xaxis_title="Stat",
title_text="Top 95% in PTS, AST, RBS Per Game VS League Average")
fig.show(renderer="notebook")
Part 2: Nearly There - Calculate Prominence¶
We now have the thresholds for each season for a player to be in the top 95% of the league for the statistic. We now have to check how many All-Stars are at/past this threshold for each season. This will tell us what the prominent statistics are!
prominent_stats_counter¶
This method, when given our data and a season, calculates the number of All-Stars for that season that pass each statistic threshold as a Series. We input our threshold list, so it wouldn't be hard-coded as the 95% percentile that we are looking for, and this allows us to test our method properly!
def prominent_stats_counter(data, stars, thresholds, season):
"""
Given the All-Star dataset, the minimum percentile thresholds, and a season, return a series
that has the number of All-Stars for the given season that are at or above the given threshold
for each statistic.
"""
curr_thresholds = thresholds[season]
curr_stars = stars[stars["season"] == season]
curr_data = data[data["season"] == season]
stars_stats = curr_data[curr_data["player"].isin(curr_stars["player"])]
stars_stats = stars_stats.drop(["seas_id", "season", "player_id", "player", "tm"], axis=1)
return (stars_stats >= curr_thresholds).sum()
# TEST PORTION
test_all_stars = all_stars[(all_stars["player"] == "Anthony Edwards") |
(all_stars["player"] == "Scottie Barnes") |
(all_stars["player"] == "LaMelo Ball") |
(all_stars["player"] == "Ja Morant") |
(all_stars["player"] == "Paolo Banchero")]
test_prominent_stats_counts = pd.DataFrame()
test_prominent_stats_counts[2024] = prominent_stats_counter(test_data, test_all_stars, test_thresholds, 2024)
test_prominent_stats_counts[2023] = prominent_stats_counter(test_data, test_all_stars, test_thresholds, 2023)
assert test_prominent_stats_counts.loc["ws", 2024] == 3
assert test_prominent_stats_counts.loc["pts_per_game", 2024] == 2
assert test_prominent_stats_counts.loc["mp", 2023] == 1
assert test_prominent_stats_counts.loc["mp", 2024] == 2
assert test_prominent_stats_counts.loc["bpm", 2023] == 2
With this new program, we can create a datasheet of the number of All-Stars that pass the prominence threshold for each season!¶
prominent_stats_counts = pd.DataFrame()
for season in range(2025, 1974, -1):
prominent_stats_counts[season] = prominent_stats_counter(player_stats, all_stars,
season_thresholds, season)
Say we want to see the statistics in 2025 with the most amount of All-Star players past the 95% threshold, we can graph it again!
fig = px.bar(prominent_stats_counts[2025], color=prominent_stats_counts[2025].values,
color_continuous_scale='sunsetdark',
title="2025 Prominent Statistics by # Of All-Stars in the top 95%")
fig.update_xaxes(tickangle=-45)
fig.update_layout(xaxis_title="Stat", yaxis_title="# of All-Stars")
fig.show(renderer="notebook")
Part 3: Final Step - Get the Prominent Statistics¶
We're nearly there! Now that we have the counts for each prominent statistic, we just need to get the top 6 largest counts for each season of the prominent_stats_counts, then those will be the six most individually important stats for All-Stars for the given season. If there are ties in numbers, we can include those stats as they would be in the top 6.
get_prominent_stats¶
This program aims to get the top 6 stats that the most All-Stars excel in. There is a chance for ties between seasons, and in that case, since I want to keep it to 6 statistics, the earliest occurring columns in the dataset that are tied for 6th will appear first.
def get_prominent_stats(counts, season):
"""
Given a counter of the All-Stars who pass the prominent stats thresholds and a season, returns
the top six stats with the most All-Stars at or above that stat threshold from that season. In
the instance of a tie, takes the earliest apparent column statistic to maintain 6 statistics.
"""
curr_counts = counts[season]
minimum_threshold = curr_counts.quantile(1 - (6 / len(curr_counts)))
return curr_counts[curr_counts >= minimum_threshold].sort_values(ascending=False)[0:6].index
# TEST PORTION
test_prominent_stats = pd.DataFrame()
test_prominent_stats[2024] = get_prominent_stats(test_prominent_stats_counts, 2024)
test_prominent_stats[2023] = get_prominent_stats(test_prominent_stats_counts, 2023)
assert test_prominent_stats.loc[0, 2024] == "mp_per_game"
assert test_prominent_stats.loc[2, 2024] == "ows"
assert test_prominent_stats.loc[3, 2023] == "obpm"
assert test_prominent_stats.loc[1, 2023] == "experience"
Great! Now we can create a datasheet of the six prominent stats for each season, then we're done!¶
prominent_stats = pd.DataFrame()
for season in range(2025, 1974, -1):
prominent_stats[season] = get_prominent_stats(prominent_stats_counts, season)
Now, we have all of the prominent statistics for any given season, and if we want to look at a specific season's prominent statistics, we can simply use the dataframe and access are season through that!
For the latest 2025 season, we can see that the most valued statistics by fans, media, and players are the following:
prominent_stats[2025]
0 fta_per_game 1 pts_per_game 2 fg_per_game 3 fga_per_game 4 ws 5 x2pa_per_game Name: 2025, dtype: object
We have:
1: Free-Throw Attempts Per Game
2: Points Per Game
3: Field Goals Per Game
4: Field Goal Attempts Per Game
5: Win Shares
6: Two-Pointers Attempted Per Game
With this new program as well, we can look at every season's prominent stats, and see which ones appear most often!¶
fig = px.bar(prominent_stats)
fig.update_xaxes(tickangle=-45)
fig.update_layout(xaxis_title="Stat", yaxis_title="# Appearences in Top 6",
legend_title="Season", coloraxis_colorbar=dict(title="Season"),
title="Prominent Stats of the 1975-2025 NBA seasons")
fig.show(renderer="notebook")
This graph shows that the 6 most common prominent stats per season are:¶
- Win Shares
- Field Goals Per Game
- Points Per Game
- Offensive Win Shares
- Free-Throws Per Game
- Tie between: PER (Player Efficiency Rating) and Field Goal Attempts Per Game
These answers are partially surprising, and they show a bias that NBA fans, media, and players have towards offensive production compared to defensive production. Each of the statistics shown in both the top 6 prominent stats for this season and all of the seasons combined are offensive-based, such as offensive win-shares, points per game, and field goals per game. Very rarely there are defensive statistics being in the top 6 prominent statistics for a season. The only one shown in the above plot are defensive win-shares, which only have three entries from the years 2020, 1997, and 1981. This does make sense though, as its easy to say that a player that produces a lot of points is an elite player in the NBA, and deserves a spot in the All-Star game. This may affect future questions, though, as they may be biased towards offense-heavy teams.
With this information... we can now move to Research Question #2!¶
RESEARCH QUESTION #2: How do prominent statistics transfer to team success?¶
Now that we have the prominent statistics for each season, we can check if the teams that are highest in specific stats are finding more team success compared to teams that aren't.
The way I will look at team success is through wins, if the team made the playoffs or not, and championships!
First, though, I have to make some changes for comparing individual statistics to team statistics, as there are a individual statistics that have team statistic equivalents. These are:
- ws_48 -> w in teams.
- obpm and ows -> o_rtg in teams.
- dbpm and dws -> d_rtg in teams.
- bpm and ws -> n_rtg in teams.
- mp -> mp_per_game in teams.
- trb_percent -> trb_per_game in teams.
- ast_percent -> ast_per_game in teams.
- experience -> age in teams.
- usg_percent -> pace in teams.
- per -> e_fg_percent in teams.
- gs -> mp_per_game in teams.
To account for this, I made a dictionary to translate individual statistics to team statistics.
stat_translations = {"ws_48": "w", "obpm": "o_rtg", "ows": "o_rtg", "dbpm": "d_rtg",
"dws": "d_rtg", "bpm": "n_rtg", "ws": "n_rtg", "mp": "mp_per_game",
"trb_percent": "trb_per_game", "ast_percent": "ast_per_game",
"experience": "age", "usg_percent": "pace", "per": "e_fg_percent",
"gs": "mp_per_game"}
Part 1: Rank teams by Prominent Stats¶
The first thing I want to do, before we check for team success, is rank each team for each season based on the prominent stats for that season. We can then see which teams are excelling in the main individual statistics that are seen in All-Star players, which will then be used to compare to actual team success that is based on wins, playoffs, and championships.
get_team_prominence¶
This program, as stated above, ranks each team by the prominent statistics, which can then be used to compare with overall team success!
def get_team_prominence(team_data, prominents, season):
"""
Given an NBA team dataset, a prominent statistic sheet, stat translations, and a
season, returns a sheet of the rankings of each team for each prominent statistics from 1-30,
Where 1 is performing highest in the ranking and 30 the lowest.
"""
curr_season = team_data[(team_data["season"] == season) &
(team_data["team"] != "League Average")] # Remove league Average row
curr_season = curr_season.set_index("team")
result = pd.DataFrame(index=curr_season.index)
curr_prominents = prominents[season]
for prominent in curr_prominents:
placement = 1
if prominent in stat_translations:
prominent = stat_translations[prominent]
curr_order = curr_season[prominent].sort_values(ascending=False)
for team in curr_order.index:
result.loc[team, f'{season}_{prominent}'] = placement
placement += 1
return result
# TEST PORTION
test1 = get_team_prominence(team_test_data, test_prominent_stats, 2024)
test2 = get_team_prominence(team_test_data, test_prominent_stats, 2023)
test_team_prominence = test1.merge(test2, left_index=True, right_index=True)
assert test_team_prominence.loc["Boston Celtics", "2024_o_rtg"] == 1
assert test_team_prominence.loc["Washington Wizards", "2023_pace"] == 5
assert test_team_prominence.loc["San Antonio Spurs", "2024_trb_per_game"] == 2
assert test_team_prominence.loc["Miami Heat", "2023_ts_percent"] == 4
assert test_team_prominence.loc["Minnesota Timberwolves", "2024_n_rtg"] == 2
Great! Now, we can graph and visualize these rankings season by season! Let's look at the 2025 season's teams:¶
curr_data = get_team_prominence(team_stats, prominent_stats, 2025)
curr_data.columns = [column[5:] for column in curr_data.columns]
fig = go.Figure()
for team in curr_data.index:
fig.add_trace(go.Scatter(
x=curr_data.columns,
y=curr_data.loc[team].values,
mode="lines+markers",
name=team
))
fig.update_yaxes(autorange="reversed")
fig.update_layout(title="2025 Teams Scaled by Prominent Statistics", yaxis_range=[1, 30],
xaxis_title="Prominent Stat", yaxis_title="Team Placement",
legend_title="Team")
fig.show(renderer="notebook")
As seen above, there is a slight trend with some teams. If you look at a team like the Brooklyn Nets, they are near the bottom, whereas a team like the Memphis Grizzlies are near the top for all of the statistics!¶
Now, we need to get the success for each team per year!¶
Part 2: Rank Teams by Team Success¶
As said earlier, I am valuing team success per season by:
- The amount of wins in a season (1 point per win)
- If the team made the playoffs (5 points if true)
- If the team won a championship (10 points if true - for 2025, we will give half points for the Oklahoma City Thunder and the Indiana Pacers, as the victor is not decided yet).
With that being said, lets rank these teams!
get_season_success¶
This program will get the success score for each team in a season and will return it as a series of scores! We will then be able to compare each season success scores to the rankings in the prominent stats!
def get_season_success(team_data, champions, season):
"""
Given an NBA team dataset, NBA champions, and a season, returns a series representing the
"success scores" for each NBA team for that season, based on wins, playoff appearance, and
championships. If the year is 2025, give 5 points to the two current finalists as the champion
isn't decided as of 6/9/2025
"""
curr_season = team_data[(team_data["season"] == season) &
(team_data["team"] != "League Average")] # Remove League Average Row
curr_season = curr_season.set_index("team")
scores = pd.Series(index=curr_season.index)
for team in scores.index:
team_score = curr_season.loc[team, "w"]
if team in curr_season[curr_season["playoffs"]].index:
team_score += 5
scores[team] = team_score
if season == 2025:
scores["Indiana Pacers"] = scores["Indiana Pacers"] + 5
scores["Oklahoma City Thunder"] = scores["Oklahoma City Thunder"] + 5
else:
curr_champ = champions[champions["Year"] == season]["NBA Champion"]
scores[curr_champ] = scores[curr_champ] + 10
return scores
# TEST PORTION
test_season_success = get_season_success(team_test_data, nba_champions, 2024)
assert test_season_success["Boston Celtics"] == 79
assert test_season_success["Detroit Pistons"] == 14
assert test_season_success["Miami Heat"] == 51
assert test_season_success["Minnesota Timberwolves"] == 61
assert test_season_success["San Antonio Spurs"] == 22
assert test_season_success["Washington Wizards"] == 15
With this new program, let's try to visualize the scores with a season such as the 2017 season!¶
graph_data = get_season_success(team_stats, nba_champions, 2017)
fig = px.bar(graph_data, color=graph_data.values, color_continuous_scale='sunsetdark',
title="2017 NBA Team Success Scores")
fig.update_xaxes(tickangle=-45)
fig.update_layout(xaxis_title="Team", yaxis_title="Success Score")
fig.show(renderer="notebook")
As seen by the graph, the champion Golden State Warriors take charge, as they had the most success, followed by the high-win San Antonio Spurs.¶
Great! Now that we have figured out the scores for each team, let's compare these scores to the average of the prominence rankings for the 30 teams!¶
Part 3: Compare the Season Scores with the Prominent Statistic Rankings.¶
We have both the Season Scores and the Prominent Statistics. Now, in order to figure out if the two correlate, we can look at the season scores and compare them to the average ranking in the prominent statistics. By graphing this, we can see if there is a correlation between the two, and our research question 2 will be answered.
Scatter Plot¶
Whille this isn't a method, we will take in our prominent rankings for a season, ang get the average ranking for each team as a series. This will give a general sense of which teams are great at the prominent individual stats seen in All-Stars, then we graph that along with the season scores.
fig = go.Figure()
for season in range(2025, 1974, -1):
curr_team_prom = get_team_prominence(team_stats, prominent_stats, season).mean(axis=1)
curr_szn_success = get_season_success(team_stats, nba_champions, season)
fig.add_trace(go.Scatter(y=curr_szn_success.values, x=curr_team_prom.values, mode="markers",
name=season))
fig.update_layout(title="Season Success VS Team Prominence", xaxis_title="Prominent Ranking",
yaxis_title="Season Score", legend_title="Season")
fig.update_xaxes(autorange="reversed")
fig.show(renderer="notebook")
Part 4: Research Question #2 Conclusion¶
As seen above, there is a noticable correlation between a team's success score for a season versus their average prominent stat ranking for the same season. Its important to notice the abundance of dots in the lower left corner and the upper right corner, and the lack of dots in the lower right and upper left corners. This shows that there is a positive correlation meaning that, if a team is doing well in the prominent stats for that season, then they are likely to find themselves having team success
Now that we know the individual statistics that make All-Stars elite also correlate to team success, let's move onto Research Question #3!¶
RESEARCH QUESTION #3: Do teams need All-Stars in order to find season success?¶
So far, we have concluded the statistics that separate All-Stars from the ordinary player in the NBA, and concluded that these stats in teams correlate directly to team success. Now, we must check the number of All-Stars for each team in a season and compare that to the success that the team had that season. Let's get started!
Part 1: Get All-Stars For Each Team in a Season¶
The first thing we will have to do is find a way to get each team in the season's counts of All-Stars. To do this, we can write a simple program.
get_team_all_stars¶
This program will take in a season and the necessary datasheets and return a series of all the counts of All-Stars for each team. This will be great for the future, as we can then check each team throughout every season and get the number of All-Stars easily!
def get_team_all_stars(stars, player_data, team_data, season):
"""
Given datasheets of All-Stars, player statistics, team statistics, and a season, returns a
series representing the counts of All-Stars for each team that season. If a team had no
All-Stars in the season, give them a value of 0 rather than N/A
"""
curr_teams = team_data[(team_data["season"] == season) &
(team_data["team"] != "League Average")]
curr_teams = curr_teams.set_index("team")
result = pd.Series([0]*len(curr_teams.index), index=curr_teams.index)
curr_players = player_data[(player_data["season"] == season) &
(player_data["tm"] != "TOT")][["player", "tm"]]
curr_stars = stars[stars["season"] == season]["player"]
for star in curr_stars.values:
curr_abbrevs = curr_players[curr_players["player"] == star]["tm"].values
for abbrev in curr_abbrevs: # in the case that an All-Star switched teams mid-season
curr_team = curr_teams[curr_teams["abbreviation"] == abbrev].index[0]
result[curr_team] += 1
return result
# TEST PORTION
# Since the number of All-Stars in a year is small and managable, use default data
test = get_team_all_stars(all_stars, player_stats, team_stats, 2024)
assert test["Minnesota Timberwolves"] == 2 # Karl-Anthony Towns, Anthony Edwards
assert test["Houston Rockets"] == 0
assert test["Brooklyn Nets"] == 0
assert test["Atlanta Hawks"] == 1 # Trae Young
Great! Now, lets get a visualization of a season's All-Stars, my personal favorite visualization of this is in the 2017 season!¶
fig_data = get_team_all_stars(all_stars, player_stats, team_stats, 2017)
fig = px.bar(fig_data, x=fig_data.index, y=fig_data.values, color=fig_data.values,
color_continuous_scale="sunsetdark", title="2017 All-Stars Per Team")
fig.update_layout(xaxis_title="Team", yaxis_title="# All-Stars")
fig.update_xaxes(tickangle=-45)
fig.show(renderer="notebook")
As can be seen, the Golden State Warriors had 4 players chosen for the All-Star game in 2017, a ridiculous amount!¶
Part 2: Compare With Success Scores (Scatter Plot)¶
This part will be similar to the last step in Research Question #2, we have to represent the comparisons between the All-Star count and success scores using a graph, which will show if there is a correlation or not between having All-Stars on a team and finding success.
Scatter Plot¶
Once again, we will use a scatter plot, where the x values are the success scores, and the y values represent the number of All-Stars. This will give a level-based representation where distinction between teams will be clear based on the All-Star count. Since there is a likelyhood of duplicate values, we can scatter duplicate points to get a better representation of the data.
fig = go.Figure()
for season in range(2025, 1974, -1):
curr_szn_success = get_season_success(team_stats, nba_champions, season)
curr_all_star_count = get_team_all_stars(all_stars, player_stats, team_stats, season)
fig.add_trace(go.Scatter(x=curr_szn_success.values, y=curr_all_star_count.values,
mode="markers", name=season, opacity=.7))
fig.update_traces(marker_size=8)
fig.update_layout(title="Season Success VS All-Star Counts", xaxis_title="Season Success Score",
yaxis_title="# All-Stars", legend_title="Season", scattermode="group")
fig.update_yaxes(range=[-0.4, 4.4])
fig.update_xaxes(range=[0, 90])
fig.show(renderer="notebook")
As shown in the graph, there is a fairly clear trend between the number of All-Stars a team has versus team success in a season. But there is still some clarification needed.¶
Part 3: Answer¶
The question specifically asks if teams need an All-Star in order to be successful. This isn't necessarily true in all cases. An example being the Denver Nuggets in 2013, where the team found themselves at a season score of 63 without a single All-Star. This score was even higher than the 2011 Boston Celtics, which had 4 All-Stars on their team and success score of just 61. However, it is clear that the addition of just a single All-Star is enough to push your season score up nearly 10 points, and increase the max by around 16. So, do teams need an All-Star to find success? No, but it without a doubt increases the odds of finding success, with the most successful teams of all time having 2 All-Star players on their roster. This means that, if you see a team that has an All-Star, it's reasonable to expect more success from them compared to a team that lacks one.
This concludes Research Question #3!¶
As a quick summary, here's what we learned through ALL research questions:¶
- Question 1: What are the prominent statistics of an All-Star?
- The six prominent statistics for an All-Star in the 2025 season are:
- Free-Throw Attempts Per Game
- Points Per Game
- Field Goals Per Game
- Field Goal Attempts Per Game
- Win Shares
- Two-Pointers Attempted Per Game
- And the six prominent statistics throughout all the seasons are:
- Win Shares
- Field Goals Per Game
- Points Per Game
- Offensive Win Shares
- Free-Throws Per Game
- Tie between: PER (Player Efficiency Rating) and Field Goal Attempts Per Game
- The six prominent statistics for an All-Star in the 2025 season are:
- Question 2: How do prominent statistics transfer to team success?
- While not all prominent stats directly transfer to team success, there is a general trend where the better a team is at the six prominent statistics, the higher chance they find success in the season.
- This means that individual statistics do impact the game at a team-level as well!
- Question 3: Do teams need All-Stars in order to find season success?
- The short answer is no, teams do not need an All-Star in order to find success.
- However, teams that do have an All-Star have a higher chance of being exceptionally good, and are more likely to find success than a team without an All-Star
- More specifically, 2 All-Stars is the amount that the most successful teams had, with 4 having the highest floor for success.
That Concludes the Results!¶
Implications and Limitations¶
I am really proud of this study from a statistical analysis standpoint and a passion standpoint, I have been watching the NBA for a long, long time and to complete this project took a lot of time, but was a lot of fun the entire way. However, there is still more to discuss!
This project contains real statistics that are associated with real players that are still playing today. With that, this study can have some affect on real-world opinions, but there are some factors that need to be considered, lets get started with the impact and implications that this study provides!
Implications¶
When looking at this study, there are a few conclusions that have pretty big implications, and ones that can impact many different individuals in both good and bad ways, to start, let's go through the people that may benefit from my analysis and why they would benefit.
The Winners¶
- Fans of the NBA (Both Casual and Non-Casual)
- The way that fans could benefit from this analysis is balanced for both the casual and non-casual fan.
- A casual fan could see the six prominent statistics that I have found for players either in this season or all seasons and use these statistics in order to determine who they vote for in the All-Star game in their current season.
- Since a casual fan probably hasn't seen every player and hasn't seen every game and/or highlight, checking the stat sheets and finding players with high points-per-game, high shots made, and high win shares would be good for giving them an impression on which players in the league are performing best.
- For a non-casual fan, one who watches a lot of games and follows the league, they would be able to use their own knowledge in alignment with this study in order to figure out what players they want to vote for in the All-Star game, but they could even go further and use my later research questions to determine if there are teams that they should either bet on, make predictions on, or even root for in an upcoming season.
- The problem with this is that now, a non-casual fan could be risking money based on this study, which could be a problem as I'll explain in the limitations section.
- Media personnel in the NBA
- There have been many times where members of the media for the NBA have been scrutinized for either having bad takes, bias, or incorrect facts. This study could help with this issue, by giving media members a decent basis for judging if a player is good or not.
- An example could be, when it is coming time for media members to vote for the All-Star game, they could use prominent statistics joined with their own eye-test and biases in order to vote for what they think the best 24 players are in a season.
- This could also be beneficial for them because if, say, they got scrutiny for their votes or their picks, they could back it up with actual statistics and research provided by this study.
- NBA players and coaches
- To be very clear, I don't think that NBA players should try to focus only on that stats that are shown as prominent statistics. This would lead to a lot of individual play in teams and would probably hinder success more than anything.
- I believe that NBA players and coaches would find value out of the second and third question, as it can give coaches an idea of roles that players should have on their team.
- As an example, a coach could look at my results from the 3rd question and pick two of the best players on their team to fit into the All-Star mold, where they focus on the prominent statistics and improving their areas in that field.
- They could also use the conclusions from the second question in order to figure out what is needed to be focused on as a team, and what specific style they should push in order to perform better
- Plus, as with the last two benefiters, they can use the prominent statistics and the players performing well in them in order to vote for All-Stars in the current season.
From this study, there is a decent amount of benefit that several different people can have. However, there is still some harm that this study can bring:
The Losers¶
- NBA players and coaches (again)
- The big issue that can come with using this project is, as mentioned above, that players may look at the prominent statistics and focus way too much on individuality in their own game and possibily hinder their team.
- Another issue is that, while it is easy to say "just improve at these statistics" there is a whole lot more to it and it isn't as simple as flipping a switch.
- This means that a coach can say that their team should focus on a few statistics, like points per game, win shares, and rating, but it comes down to how well their team plays, and they can't simply say "score more points" and expect it to happen.
- Still, I think the benefits outweigh the negatives, but it does depend on how the data is treated by the players and coaches.
- Sports betters
- There are a lot of sports betters out in the world that will use any statistic thrown at them and attempt to make a bet out of it or try to get an advantage against the house, this wouldn't be an exception.
- A sports better may see my study, and look at the questions regarding team success for both All-Stars and prominent statistics, then try to predict which team will perform better, win, or even get the championship based on my data. While there was a clear correlation that came with my data, It isn't a clear 1:1 ratio between team success and prominent statistic rankings/All-Star players.
- If sports betters attempt to use this study in order to make bets, I can't see them winning in the long run or gaining any benefit that they may think they have.
In my opinion, the benefits slightly outweigh the negatives, but I feel like the negative impacts could affect an ordinary individual compared to the benefits. Now, let me explain why I think this study shouldn't be completely trusted for judging player performance and team success.
Limitations¶
Now, while all of this study is backed by real statistics and using only data, there are still some flaws that affect the validity and accuracy of my conclusions
- Coincidences in prominent stats
- Occasionally, there were seasons where one of the six prominent stats had to do with a statistic that, when looked at on a team-level, have nothing to do with overall success and simply messed up some of the ranking data. This includes but is not limited to:
- Turnovers Per Game
- Age
- Minutes Played
- These statistics aren't great in discerning if a team is good or not, with one of these (turnovers per game) being a stat that teams shouldn't want, but was a prominent statistic for a few seasons.
- These statistics lead to a ranking that is nowhere near correlated to success scores, and lead to slightly skewed data when plotting.
- Overall, though, most of the prominent statistics had relevance to performance, and the plotting showed that properly.
- Occasionally, there were seasons where one of the six prominent stats had to do with a statistic that, when looked at on a team-level, have nothing to do with overall success and simply messed up some of the ranking data. This includes but is not limited to:
- Potential flaws in Season Score calculation
- I only looked at three different statistics when calculating the success scores of teams in different seasons, and the weighting of certain statistics could have been flawed.
- What this may lead to is inconsistencies in Season Scores, where one team may have done a lot better than their score represents, and a team could have done a lot worse than what their score represents as well.
- These scores also don't factor in playoff wins specifically, which are a big thing. A team that makes the Western Conference Finals could have a case for having a better season than a team that had more regular season wins, but lost in the first round.
- Still, I think my scoring method for seasons was pretty solid, and it worked well enough to show my findings.
- Lack of the eye-test
- As a fan of basketball and as a fan of the NBA, I can say without a doubt that ONLY using statistics to determine if a player is good or not is a really, really flawed concept.
- If you are unfamiliar, the eye-test involves judging a player based on how good they seem when watching them play. If a player looks like they're good, then that is their judgment based on the eye-test.
- Since this stat involves no actual gameplay and only statistics, if someone wanted to base their judgment on what makes someone good or bad at basketball on prominent statistics or my analysis, they are forgetting one of the biggest parts of forming an opinion in the NBA. You have to watch the games, and see for yourself the players that are good. There are some amazing players in the league that don't have as good stats as some of the stars.
- My favorite example of this is a player like Tyrese Haliburton this year. Let's take a look at his statistics for this season:
tyrese = player_stats[(player_stats["player"] == "Tyrese Haliburton")]
tyrese = tyrese.loc[701, :]
tyrese
seas_id 32572.0 season 2025 player_id 4892.0 player Tyrese Haliburton age 24.0 experience 5 tm IND g 73 gs 73.0 mp_per_game 33.6 fg_per_game 6.5 fga_per_game 13.8 fg_percent 0.473 x3p_per_game 3.0 x3pa_per_game 7.7 x3p_percent 0.388 x2p_per_game 3.5 x2pa_per_game 6.1 x2p_percent 0.581 e_fg_percent 0.582 ft_per_game 2.6 fta_per_game 3.0 ft_percent 0.851 orb_per_game 0.6 drb_per_game 3.0 trb_per_game 3.5 ast_per_game 9.2 stl_per_game 1.4 blk_per_game 0.7 tov_per_game 1.6 pf_per_game 1.3 pts_per_game 18.6 mp 2451.0 per 21.8 ts_percent 0.616 x3p_ar 0.559 f_tr 0.221 orb_percent 1.9 drb_percent 9.7 trb_percent 5.9 ast_percent 38.9 stl_percent 2.1 blk_percent 1.8 tov_percent 9.8 usg_percent 21.6 ows 8.1 dws 2.3 ws 10.4 ws_48 0.204 obpm 5.7 dbpm 0.2 bpm 5.8 Name: 701, dtype: object
- Right now, these are just numbers, but there is a bigger story behind these.
- Tyrese Haliburton was an All-Star this year, but if we compared his stats to the thresholds for this season:
tyrese_stats = tyrese.drop(["seas_id", "season", "player_id", "player", "tm", "mp"])
curr_season = season_thresholds[2025].drop(["mp"])
fig = go.Figure(data=[
go.Bar(name='Tyrese', x=tyrese_stats.index,
y=tyrese_stats.values),
go.Bar(name='Top 95%', x=curr_season.index,
y=curr_season.values)
])
fig.update_layout(barmode='group', yaxis_title="# Of Stat", xaxis_title="Stat",
title_text="Prominent Thresholds vs Tyrese Haliburton")
fig.update_xaxes(tickangle=-45)
fig.show(renderer="notebook")
- It can be seen that, while he is above the prominence threshold for some common prominent stats such as win shares and box plus-minus, he is below the threshold for some of the most common stats that fans look at such as points per game, field goals per game, and free-throws per game.
- However, if you watch Tyrese Haliburton, especially in the 2025 playoffs, he is arguable one of the best players in the world at his position and is currently leading a finals run for the Indiana Pacers.
- A casual fan may see the basic statistics, such as points per game, field goals per game, and so on, and think that Tyrese isn't completely fit to be an All-Star due to being below the threshold in these categories, when he completely deserves a spot.
- This base example is a phenomena that can be shown on any player, especially players that excel defensively and off-ball.
- This study, and the voting of All-Stars, is fairly biased towards offensive production, which is why I think a study like this shouldn't be the only thing fans look at for voting, and instead they should watch the players themselves and judge based on their own opinion on their performance.
To conclude, while I think this study was a success and the results actually provide some meaningful insight, I do think that those that view this study and base their NBA opinions on this study should be careful due to some of the limitations I explained. However, I still think it is a general rule of thumb that teams with All-Star players will do better, and that the stats that All-Stars excel at compared to the rest of the league are also valuable stats to have on a team. Thank you for viewing this project, and I hope you come away from this with a little bit more NBA knowledge if you had none beforehand, and hopefully a new appreciation for statistics and what makes All-Stars valuable. Thank you!
- Joey Reitz